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Abstract 

Evolvability by means of simple sequence repeat (SSR) instability is a feature under the constant influence of opposing 
selective pressures to expand and compress the repeat tract and is mechanistically influenced by factors that affect genetic 
instability. In addition to direct selection for protein expression and structural integrity, other factors that influence tract 
length evolution were studied. The genetic instability of SSRs that switch the expression of antibiotic resistance ON and OFF 
was modelled mathematically and monitored in a panel of live meningococcal strains. The mathematical model showed 
that the SSR length of a theoretical locus in an evolving population may be shaped by direct selection of expression status 
(ON or OFF), tract length dependent (a) and tract length independent factors {[I). According to the model an increase in a. 
drives the evolution towards shorter tracts. An increase in fi drives the evolution towards a normal distribution of tract 
lengths given that an upper and a lower limit are set. Insertion and deletion biases were shown to skew allelic distributions 
in both directions. The meningococcal SSR model was tested in vivo by monitoring the frequency of spectinomycin 
resistance OFF— >ON switching in a designed locus. The instability of a comprehensive panel of the homopolymeric SSRs, 
constituted of a range of 5-1 3 guanine nucleotides, was monitored in wildtype and mismatch repair deficient backgrounds. 
Both the repeat length itself and mismatch repair deficiency were shown to influence the genetic instability of the 
homopolymeric tracts. A possible insertion bias was observed in tracts =£G 10 . Finally, an inverse correlation between the 
number of tract-encoded amino acids and growth in the presence of ON-selection illustrated a limitation to SSR expansion 
in an essential gene associated with the designed model locus and the protein function mediating antibiotic resistance. 
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Introduction 

Genetic simple sequence repeats (SSRs) are inherently unstable 
[1,2]. Long homopolymeric repeats in codon pair usage are 
avoided in all kingdoms of life due to their mutability [3] . Yet, 
many microorganisms utilize such repeats to generate variability in 
genes under selection for diversification [4], typically antigenic 
epitopes in bacterial pathogens and commensals that are exposed 
to adaptive immune systems. The opportunistic pathogen Neisseria 
meningitidis (Mc) and the obligate pathogen N. gonorrhoeae have small 
hyper-dynamic genomes with elevated spontaneous mutation rates 
compared to other bacterial species [5-7]. High mutation rates 
give rise to a multitude of genomic variants that may expand the 
fitness landscape, depending on the selective forces. The hyper- 
mutability is essential to Mc adaptability at the cost of genomic 
stability [8]. Two major genetic mechanisms have evolved to drive 
this rapid genomic hyper-variation, namely slipped-strand mis- 
pairing during replication of repeats and intrachromosomal 
recombination between silent and expression loci, resulting in 
phase variation (PV) and antigenic variation, respectively [9,10]. 
Depending on the context, repeat-mediated instability may be 
under positive or negative selection causing either the establish- 



ment and maintenance or avoidance of repeats, respectively. 
Essential genes, of which expression is required for survival, tend 
not to harbour SSRs due to the inherent risk of causing alterations 
in the reading frame. Genetic variability can also be caused and 
amplified by defects in DNA repair (mutators), through variations 
in post-translational modifications of regulatory proteins and 
through horizontal gene transfer [1 1-15]. These sources of genetic 
variation may overlap and influence each other such as in the PV 
glycosylation gene network [16] and through the destabilizing 
effect transforming DNA has on poly guanine (polyG) tract 
stability [17]. In addition, transformation has been considered an 
important source for genetic variability [18]. 

Contingency loci, a collective term for genes subjected to PV, 
are generally involved in the structure or biosynthesis of surface 
proteins with simple sequence motifs, such as polynucleotide 
tracts, located in coding or promoter regions [19,20]. In PV genes, 
the expression may switch ON/ OFF status (reversible expression) 
by frameshift alterations or promoter efficacy depending on cis 
acting factors, the genetic composition of the repeat, or external 
trans acting factors [21-23]. The major cis acting factors are repeat 
number, repeat unit length, repeat sequence and purity, whereas 
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DNA-polymerases, RNase H and mismatch repair (MMR) are 
trans acting factors shown to affect PV frequencies in both 
prokaryotic and eukaryotic model systems [21,22,24-28]. Poly- 
meric tract length alterations in bacteria have been shown to be 
directionally biased depending on tract length and the nature of 
the tract [22,29]. A recent study of a homopolymeric PV loci in 
Campylobacter jejuni revealed a shift from an insertion bias for shorter 
polyG tract lengths (8-9 nt) to a deletion bias for longer polyG 
tract lengths (10-11 nt) [30]. SSR instability offer plasticity in a 
stochastic manner, spontaneously or as a response to environ- 
mental change [31], such as the host immune response [32,33], 
competition and bacteriophage infection [4,34], cationic antimi- 
crobial peptides (CAMPs) [35] or e.g. temperature and pH [36]. 
An accidental human passage of an Mc strain showed that more 
differences had occurred in PV genes during growth in vivo than in 
vitro, highlighting the influence of selection on Mc adaptation by 
means of PV [37]. The abundant Correia elements found in 
neisserial genomes have also been shown to provide a potential 
mechanism for PV by their unstable function as strong promoters 
[38]. PV may be considered a programmed event as the 
distribution of error-prone repeats is not random, making certain 
shifts in ON^OFF expression more likely to happen [32]. Based 
on comparative whole genome analyses in Mc and JV. gonorrhoeae, 
more than 100 putative PV genes have been proposed and revised 
[39-43]. Homopolymeric tracts in naturally occurring PV loci in 
Neisseria sp. have been shown to vary considerably in length 
ranging from 7 nt to 19 nt [43,44]. PV frequencies between 10 -4 
and 10 have been documented in Mc and these have been 
shown to change in both a tract length dependent manner and due 
to the influence of MMR [27,45,46]. A polyG tract length of nine 
nucleotides (G 9 ), representing the ON-state of a haemoglobin 
binding protein (HmbR), was shown to be under positive selection 
in Mc disease isolates compared to carrier isolates [44]. In the 
same study, the ON-state (Gi 0 and G13) of the genes encoding the 
haptoglobin-haemoglobin binding complex (HpuAB) was over- 
represented compared to the OFF-state in both carrier and disease 
isolates. A recent study found reduced expression states of multiple 
PV loci during long-term Mc carriage, in part due to antibody- 
mediated selection [47]. These studies illustrate that selection for 
expression of individual loci may fluctuate between ON- and OFF- 
states and is thereby responsible for the evolution of PV. Through 
their contribution to epitope variation, PV genes enable Mc cells 
to evade host immune responses [32,33,48], providing a substan- 
tial challenge in making vaccines with persistent and broad 
protection [49]. 

Base-base mismatches appear spontaneously during replication 
either on the synthesis or the template strand causing insertions 
and deletions events (for review see [21]). Most of such errors are 
accurately repaired by post-replication MMR [50]. It is during 
these replication steps that the repeat tracts of PV genes switch 
between ON/OFF states, thus MMR limits or counteract PV [46]. 
The correlation between MMR and PV has been particularly well 
documented in Mc [27,48] and defective MMR is a characteristic 
of Mc serogroup A disease suggesting that hyper-mutability may 
play a major role in virulence and infectivity [45]. Furthermore, 
Mc MMR deficiency has been found to enhance the escape from 
the bactericidal activity of a monoclonal antibody by PV of a 
relevant lipopolysaccharide gene (IgtG) [48]. Mc and many other 
bacterial entities of the (3- and y-Proteobacteria lack MutH in a 
methyl-independent MMR and the two main MMR components 
are therefore limited to MutS and MutL [15]. It was previously 
reported that Mc MMR components may be transiendy titrated 
during uptake of DNA during Mc transformation, demonstrating 



that the PV frequency may increase during transformation and PV 
modulation by MMR may vary with the mismatch load [17]. 

Materials and Methods 

Mathematical model of homopolymeric tract length 
evolution 

Probabilistic model. We consider homopolymeric tracts, 
which can have lengths varying from 5 to 13 nucleotides. Given a 
homopolymeric tract, let us denote by G(t) its length, which is a 
function of time. The function G(t) takes value in the set {5,. . ., 1 3}. 
For the sake of presentation, we introduce G, = i and denote by L 
the set of all possible tract lengths (L= {5,. ..,13}). In this section, 
we focus on the probabilistic evolution of the tract length of a 
homopolymeric tract. In section 2 we introduce selection and 
model the effect on tract length evolution. 

In an interval of time At, a. homopolymeric tract of length G t 
may acquire a new nucleotide and thus increase its length by one. 
The probability for such an event is y,At where y, is a given 
constant. It means that we have: 

P(G{t+Ai) = G i+ \\ G{t) = Gi) = y,At ( 1 ) 

That is, the probability that a tract with a length G, at time t 
ends up with a length G, + j at time t+At is equal to y,At. Similarly 
for nucleotide deletions we have: 

P(G(t + At) = Gi- 1 1 G(t) = d) = SiAt (2) 

where 8, is a given constant, that is, the probability that a tract of 
length Gi at time t ends up with a length G,_ ; at time t+At is equal 
to h,At. For simplicity we assume that those probabilities are linear 
within the range (L = {5,..., 13}) so that they can be written as: 

y t = a(G, - Go) + p and <5,- = S(G, - G 0 ) + /? 

where a, /?, a and /J are constants that designate the ability to make 
mistakes by slippage, a and a. are here modelled to influence the 
probability of generating tract length alterations (y and 8) in a tract 
length dependent manner (cis acting factor), ji and /} are constants 
that designate the baseline instability in a tract length independent 
manner (trans acting factor). 

Let us now derive the evolution equations of the probabilities 
Ui(t) = P(G(t) = G,) 

We have: 

n f (f+Af)= ^2 P(G(t + Af) = G,- 1 G(0 = Gj)P(G(t) = Gj (3) 

The probability that a tract length changes by one unit is given 
in (1) and (2). Furthermore, we assume that the event that a tract 
length changes by more than one unit occurs as a combination of 
successive single unit changes. Then, the probability of such an 
event is negligible with respect to At when At tends to zero and (3) 
can be rewritten as: 
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(4) 



n,(* + Ar) = P(G(t + At) = GMt) = Gi- 1 )P(G(t) = G, ■_ , ) 
+ P(G(?+ AO = Gi\G(i) = Gi)P(G(i) = Gi) 
+ P(G(t+At) = Gi\G(t) = G,+, )P(G(J) 
= G,+i)+o(A0 

Here, o(A/) denotes a function satisfying 



lim — — = 0, 

Ar-o At 

that is, a function that is negligible with respect to At, when At 
tends to zero. Since P is a probability, we have: 



P(G(t + At) = G,\ G(t) = G) 

= 1- ^2 P(G(t + At = Gj\G(t) = Gi) 

j^Lj^i 

which can be rewritten as: 

P(G(t + AO = Gi | G(0 = G,) = 1 - y, At - d f At + o(0 
and equation (4) becomes: 

Ui(t+At) = 

y,_ , A/n,_ j (0 + (1 - y, At - S i+ 1 Afll,-+ , (0 + o(At) 



(5) 



From equation (5), after dividing by At and letting At tend to 
zero, we obtain the following ordinary differential equation for II,, 



dUi 
dt 



-- 7i- 1 n, ■_ i - (y,. + <s,)n, + <5 i+ 1 n,+ , 



(6) 



We assume that the insertion (y) and deletion (5) probabilities 
are equal, that is, a = a and /? = /?. However, this may not 
necessarily be true in vivo and we also simulate two cases 
representing an insertion and a deletion bias. We then model 
the linear increase in tract instability (y and <5) of hypothetical 
DNA repair deficient populations (e.g. MMR deficient), attempt- 
ing to illustrate how DNA repair deficiency may influence the total 
instability relative to a wt population. The MMR deficient 
populations are assigned a higher initial /? compared to the wt 
population. We then model three different scenarios: ij a in the 
MMR deficient population is identical to the wt population 

— — = 1 , it) a for the MMR deficient population is higher than in 

the wt population, and the a ratio between the MMR deficient 
population and the wt is smaller than the ft ratio 
(1 <<Xldef/<Xlwt<Pdef/Piwt)> an d finally Hi) where the a. ratio and 
the /? ratio are equal in the two populations 
(\<a [de fla [wt = p def /P [wt ). 

The evolution of tract length distributions under the 
influence of selection. Here we tested the effect of selection 
(ON or OFF) on homopolymeric tract length evolution. We 
denote /i,- the reproduction rate and d, the death rate of organisms 



with tract lengths G,. We know that organisms harbouring 
homopolymeric tract lengths that are multiples of 3 allows for 
genetic expression (ON), so that, in the presence of selection for 
expression, their death rate becomes much lower than for 
organisms with tract lengths that are not multiples of 3. We can 
model that by taking: 



d, for/ = 6,9,12 



di otherwise 



where d, is a small mortality rate and dt is a large mortality rate. 

Let us now denote by SJt) the number of homopolymeric tracts 
of length i for a given time t. We consider a system where resources 
are limited and introduce a carrying capacity K. Then, adding 
birth, death rate to the model equation (6), we obtain the following 
evolution equation for S t : 



dSi 
dt 



y,_ , S t . i + {Hi- ^ - y, - Si)S, +d i+1 +S i+1 (7) 



Where Ns= Ylj=s 5,-denotes the total population. 

We make the simplifying assumptions that jii = )i does not 
depend on the DNA tract length, that the probabilities of 
nucleotide insertions and deletions are equal and that they are 
an increasing function of the length. Thus, the system can be 
rewritten as: 



dSi 
dt 



= 7i-iSi-i+(Hi 



djN s 



-2y,-)S,: + y,- +1 S, + i (8) 



We solve Equation (8) and look at the distribution of organisms 
with respect to their length at equilibrium (that is after a long 
time). The parameters a, /?, were manipulated while all other 
parameters were kept constant in order to investigate tract length 
evolution. 

We model the scenario where there is selection against genetic 
expression (OFF), decreasing the death rate for tract lengths that 
are not multiples of 3, by taking: 



<£ for /= 5,7,8,10,1 1,13 
dt otherwise 



We use this to model the allelic distribution at steady-state 
equilibrium and to model an oscillating effect of ON/OFF 
selection. In addition, we also include scenarios of no selection 
(constant death rate </, for all tract lengths), tract length dependent 
death rate (d t increasing by tract length), and combination of tract 
length dependent death rate and ON or OFF selection (both d s 
and di increase by tract length). 

Mapping the PV loci in Mc genomes 

In order to obtain an overview of the distribution of 
homopolymeric tract lengths in Mc, known PV loci in three 
sequenced and annotated Mc genomes (MC58, Z2491 and 
FAM18) were compiled and their respective tracts compared 
(Table SI). In the few cases of conflicting data, the most recent 
references were used. Additional BLAST searches and alignments 
were conducted to confirm previously published data and to 
address PV loci in Mc strains lacking data. 
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Bacterial strains, media and growth conditions 

Mc and Escherichia coli strains used in this study are listed in 
Table 1; jV". meningitidis serogroup A strain Z1099 was employed as 
in a previous publication [51] and grown on commercially 
available and CE marked 5% human blood agar plates or GC 
agar plates at 37°C with 5% C0 2 , while E. coli was grown on 
Luria-Bertani (LB) agar at 37°C. For selection, Mc was grown in 
the presence of 100 \lg ml 1 kanamycin (Kan), 3 |J,g ml 1 
erythromycin or 50 )J.g ml 1 spectinomycin (Spc), and for E. coli 
50 |J,g ml Kan or 50 )J.g ml ampicillin when appropriate. 

Preparation of PV constructs with different repeat 
lengths 

Plasmids employed are listed in Table 1 . The panel of pUD 
plasmids was transformed into Mc serogroup A strain Z1099 to 
allow the aadA reporter gene with varying polyG tract lengths to 
integrate into the hpuB locus in the chromosome as previously 
described [17,51]. The aadA gene encodes 3"-adenylyl-transferase 
conferring spectinomycin (Spc) resistance. Spc is an aminocyclitol, 
closely related to the aminoglycosides, which is bacteriostatic by 
interfering with the mRNA-ribosome interaction. In this live Mc 
model, AadA expression (ON) is achieved by the spontaneous 
nucleotide insertion or deletion of the polyG tract inserted into the 
coding region of the aadA gene encoding the fortieth amino acid 
and onwards [17]. The aadA reporter model is artificial in that it is 
not a PV gene in nature and is made by substituting a non- 
homopolymeric Gly-codon pair with consecutive polyG Gly- 
codons. Out-of-frame polyG tracts G 5 , G 7 , G 10 , G n and G 13 were 
made by site-directed mutagenesis of the Ga tract designed by 
Alexander and co-workers [17] following standard protocols from 
Stratagene [52], using Pfu Turbo polymerase and primers listed in 
Table 1 . In order to switch into frame, each construct required a 
single nt insertion or deletion as listed in Table 2. Two non- 
switching controls, Og and Oi 2 , in which the polyG tracts were 
replaced by non-homopolymeric codons (i.e. not GGG) transcrib- 
ing the same amino acid (Gly), were constructed (Table 2). 
Plasmids were transformed into E. coli strain ER2566 and repeat 
lengths were confirmed by DNA sequencing using an ABI BigDye 
Terminator v. 3.1 DNA sequencing kit (Applied Biosystems) and 
primers listed in Table 1 . Transformation of the plasmids into Mc 
strain Z1099 with an intact MMR was performed essentially as 
described previously [53]. Single colonies were cloned and the 
individual polyG repeat lengths were confirmed by PCR using 
Phusion High-Fidelity Polymerase (Finnzymes) to minimize stutter 
product formation [54] and subsequent DNA sequencing was 
undertaken as described above. 

Preparation of MMR null mutants 

To study the effect of a defect in the MMR pathway on PV 
frequency, Mc null mutants were constructed using a mutL knock- 
out plasmid pOYLAmutL::aph (Table 3). The pO¥LAmutL::aph 
containing the neisserial MMR gene mutL interrupted by a 
selective marker (aph) encoding Kan resistance was made using 
primers listed in Table 1, following standard protocols and 
procedures. Transformations of the pOHAmutL:aph into Mc with 
polyG tract constructs were conducted as described above. 

PV assay 

Mc wildtype (wt) and mutL mutants were grown on commer- 
cially available and CE marked 5% human blood agar plates. 
Single colonies were resuspended in 200 uL 1 x phosphate 
buffered saline (PBS) in multiple well plates (96 wells - round- 
bottom, Sarstedt, US). For the constructs with very short or long 



polyG tract (G s , G 7 and G 13 ), pooled suspensions of up to 10 single 
colonies were used to allow the appearance of sufficient numbers 
of colonies on selective plates. Serial dilutions for the assessment of 
spontaneous mutations and total number of Mc colony forming 
units (CFUs) were prepared using 20 uL stocks and 180 n,L 1 x 
PBS, from which 100 |xL of the appropriate dilution was then 
spread on selective (Spc) and non-selective blood plates, respec- 
tively. The polyG constructs were designed and confirmed to be 
initially out-of-frame (OFF), while rescued growth on selective 
media enabled a direct measure for the tendency of each tract to 
switch into reading frame (ON). The two non-repeat controls, 0 9 
and 0 12 , were measured as survival frequency as these already 
expressed AadA and Spc resistance. In this assay, PV occurs 
during non-selective growth by slipped-strand mispairing in the 
replication of a polyG tract inside the model gene, aadA. A subset 
of the initial Spc sensitive clone thereby becomes Spc resistant 
through nt insertions or deletions of the polyG tract, as defined in 
Table 2, allowing AadA expression. Switching frequencies are as 
such a measure of the proportion of a single colony that expresses 
AadA prior to selection. Notably, the proportion of a colony that 
expresses AadA is dependent on both the rate of switching, but 
also the number of cell divisions that follow switches, which 
produce clonal descendants. The introduction of Spc selection 
represents a botdeneck that is expected to allow the exclusive 
establishment of colonies from CFUs or parts thereof that have 
switched to an in-frame configuration and their non-switched 
descendants. The switching frequencies were studied for the panel 
of different repeat lengths in two backgrounds, wt and MMR 
deficient. Due to substantial variation in switching frequencies in 
replicate experiments and non-parametrically distributed data, 
trends of instability were monitored using median switching 
frequencies as in previous studies [27,46,55] and found to be the 
most acceptable approach in predicting mutation rates [56]. 
Statistical analyses of the non-parametric switching frequencies 
were performed using Kruskal-Wallis one-way analysis of variance 
with post Dunn's multiple comparison tests. 

Colony size measurement 

The range of colony sizes for each Spc resistant polyG construct 
and the non-repeat control was investigated in order to determine 
a suspected influence by the number of tract encoded amino acids 
in AadA on growth in the presence of Spc. Full resolution images 
of randomly chosen antibiotic resistant colonies following the PV 
assay were obtained after 48 hours growth using a Leica DC 300 
digital camera fitted onto a Leica MZ 1 6A magnifier. Images were 
analyzed using the ImageJ software setting the threshold at the 
maximum extent of the colonies, measuring the area in mm 2 . Care 
was taken to avoid the effect of nutrient depletion or other density- 
dependent growth inhibition by studying isolated single colonies 
located several colony diameters away from other colonies. The 
parametric colony size distribution and correlation with tract 
length were calculated using the Student's t-test. 

Results 

Mathematical model of SSR dynamics and the influence 
on tract length distributions 

A mathematical model was developed to study the evolutionary 
dynamics of a homopolymeric SSR using a set of simple 
parameters as described in the materials and methods. First, the 
probabilistic evolution of homopolymeric tract length was 
modelled considering a cis acting factor, tract length dependent 
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Table 1. Plasmids, primers and strains. 





Plasmids: 


Description 


Reference 


pARR2107 


spOFF and erm-resistance gene flanked by Mc hpuB 


[1] 


pBluescript SKII+ 
(pBSKII+) 


Cloning vector, Amp R 


Stratagene 


pUP6 


Cloning vector pHSS6 modified with DUS and 
reverse DUS, Kan R 


[2] 


pUD8 


pARR2107 with inserted 12-mer DUS downstream of 
spOFF, with G 8 tract 


This study 


pUD5 


Same as above, but with G 5 tract 


This study 


pUD7 


Same as above, but with G 7 tract 


This study 


pUDIO 


Same as above, but with G 10 tract 


This study 


pUDII 


Same as above, but with Gn tract 


This study 


pUD13 


Same as above, but with G 13 tract 


This study 


p09 


pARR2107 with inserted 12-mer DUS downstream of 
spOFF, with 9 nt non-repeat tract in ON-state 


This study 


p012 


Same as above, but with 12 nt non-repeat tract in 
ON-state 


This study 


pOHAmutb:aph 


pBSK 11+ harboring mutL from MC58 with inserted 
kanamycin resistance marker aph 


This study 


Strains: 


Description 


Reference 


N. meningitidis strain 
Z1099 


Serogroup A, isolated in the Philippines in 1968 


Dominique A. Caugant 


E coli strain ER2566 


Electrocompetent cells prepared following standard 
protocol 


New England Biolabs 


Primers: 


Description 


Sequence 


7160SAF-Tn5-opf)-for 


Construction of pOHAmutL::aph 


GCGGATCCTAGACTGGGCGGTTTTATGG 


7161SAF-Tn5-ap/7-rev 


Construction of pOHAmutL::aph 


GCTCTAGATCATTTCGAACCCCAGAGTC 


7162OHA101 


Construction of pOHAmutL::aph 


CGCTCGAGGGCGGATTCAGTCTTGGTAA 


7163OHA106 


Construction of pOHAmutL::aph 


GCGGATCCTTACTGTCCGCGCAAGAACA 


7164 OHA105 


Construction of pOHAmutL::aph 


GCTCTAGAGCCGAAAGTGCTAGAATACG 


7165OHA104 


Construction of pOHAmutL::aph 


GCGAGCTCCGTTTTGAGCATAGCGTTGA 


KA24 


pUD7 


GTACGGCTCTGCAGTGGATGGGGGGGCTGAAGCCACACAG 


KA25 


pUD7 


CTTCAGCCCCCCCATCCACTGCAGAGCCGTACAAATGTAC 


KA26 


Sequencing primer 


GTGATCGCCGAAGTATCGAC 


KA27 


Sequencing primer 


AGTTCGCGCTTAGCTGGATA 


KA28 


pUD5 


GTACGGCTCTGCAGTGGATGGGGGCTGAAGCCACACAG 


KA29 


pUD5 


CTTCAGCCCCCATCCACTGCAGAGCCGTACAAATGTAC 


KA30 


pUDIO 


rrpj rrrrTr m rr!irTrrzi t VC A hC-.CC AC AC AC 
Lj 1 AALjLjL 1 LA 1 vA::A - UVJUVJUUUUVJUL 1 UHAUV_L.ttL.AL.rtVJ 


KA31 


pUDIO 


r'Wr'hrCCCCCCCCCC ATCC ACTCC AC ACCCCT AC A A ATCTAC 
LI 1 v., A L ; V^. ^. ^. V_ v_ V— V- V_ V- V- A 1 L.L.rtL. 1 VJL_rtL3ttVJ\_L.LD 1 rtk-rtrtA 1 \J 1 nL 


KA32 


pUD1 1 


rrpj rrrrTr m rr!irTrrzi v vf\C\(^tC\C\C\C\(\C\C^C\( YCAACC C AC AC AC 
A 1 AALjLjL 1 LA LjLALj 1 LLA1UUUUUUUUUUUL 1 UHrtULLnLMLrtU 


KA33 


pUD1 1 


nrpmnj rCCCCCCCCCCC ATCC ACTCC AC ACCCCT AC A A ATCT AC 
LA . A 1 . ; V*. ^. V_ v_ v_ V_ V_ V_ V_ V— V_ A 1 L.L.rtL. 1 uLMUAULLU 1 rt\_Mrtrt 1 VJ 1 ttL. 


KA91 


pUD13 


GTACGGCTCTGCAGTGGATGGGGGGGGGGGGGCTGAAGCCACA- 
CAG 


KA92 


pUD13 


CTTCAGCCCCCCCCCCCCCATCCACTGCAGAGCCG- 
TACAAATGTAC 


KA111 


p09 


GTACGGCTCTGCAGTGGATGGCGGCGGCCTGAAGCCACACAG 


KA112 


p09 


CTTCAGGCCGCCGCCATCCACTGCAGAGCCGTACAAATGTAC 


JEE7 


p012 


GGCGGCGGCGGCCTGAAGCCACACAGTGATATTGATTTG 


JEE8 


p012 


GCCGCCGCCGCCATCCACTGCGGAGCCGT 


1. Alexander HL, Richardson AR, Stojiljkovic 1 (2004) Natural transformation and phase 


variation modulation in Neisseria meningitidis. Mol Microbiol 52: 771-783. 



2. Wolfgang M, Van Putten JPM, Hayes SF, Koomey M (1 999) The comP locus of Neisseria gonorrhoeae encodes a type IV prepilin that is dispensable for pilus biogenesis 
but essential for natural transformation. Mol Microbiol 31: 1345-1357. 
doi:1 0.1 371/joumal.pone.01 01 637.t001 
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instability (a), and a trans acting factor, tract length independent 
instability {fi). f! effectively describes a baseline instability that is 
equal for all tract lengths. Figure 1 shows the influence of f> on 
tract length evolution in the absence of a, without penalizing 
selection (see below) and when the range of possible tracts is 
limited upwards (13 nt) and downwards (5 nt). When the tract 
range is limited, the evolution of tract lengths takes a normal 
distribution. 

Secondly, we modelled a linear increase in total instability (y 
and 8) of hypothetical DNA repair deficient populations compared 
to a wt population (Figure SI). In this model, the values of a and f> 
for the DNA repair deficient populations were factors of a and fl 
from the wt population. By altering the a ratios in the MMR 
deficient populations we observed: i) and it) a relative decreasing 
influence on instability by MMR deficiency as tracts become 
longer, Hi) the level of instability of the MMR deficient population 
remains twice that of the wt population, regardless of tract length 
(Figure SI). 

Thirdly, the influence of penalizing selection was modelled by 
implementing higher mortality rates for tracts that were not 
multiples of 3, analogous to the in-frame tract length that allows 
AadA expression (ON) in the PV assay. Figure 2 A-D shows the 
steady-state allelic distribution of homopolymeric tracts in model 
populations that experience penalizing selection and differential 
levels of genetic instability (tract length dependent a and 
independent fl). Panel A shows the steady-state distribution of 
alleles in a population that experiences relatively low levels of 
instability from both cis [a, = 1 .0 x 10" 03 ) and trans {fi = 0.8) factors. 
The alleles most frequently expressed in the steady-state popula- 
tion in Panel A are Gg, Gg and G12, representing the in-frame 
(ON) tract lengths. By increasing the cis factor five-fold 
(a = 5.0x10 °' i ), the allelic distribution shifts towards the shorter 
in-frame tract (G 6 ) as shown in panel B. By comparing panel B to 
panel C, it is shown that by increasing the trans factor 2.5-fold 
{fi — 2), the allelic distribution shifts towards a normal distribution 
within the defined range (5-13 nt). Comparing panel C and panel 
D, the cis factor is increased 400-fold (cc = 2) and the trans factor is 
increased tenfold {fi = 20), keeping the distribution biased towards 
shorter tracts and in a near normal distribution. 

We observed that an insertion bias (y><5) drives the population 
of tract lengths at equilibrium towards longer tracts (Figure S2, 
panels C and D) which peak at G 9 compared to a deletion bias (y< 
<5) which peak at shorter tract lengths (Figure S3, panels C and D). 

It can be assumed that tract length evolution in nature fluctuate 
between periods of no selection, selection for expression and 



selection for no expression. Figure S4 shows the death rate 
imposed by these three different scenarios and Figure S4 easel— 3 
the steady-state outcomes. With a constant death rate and no 
selection (Figure S4 easel), the steady-state of the allelic 
distribution remains equal to or similar to normal distribution 
(panels A-C). When a is increased the distribution is skewed 
towards shorter tracts (panel D). In the scenario with OFF 
selection (Figure S4 case 3), the allelic distributions mirrors that 
with ON selection as shown in case 2 (Figure S4 case 2) and 
Figure 2; again an increase in a skews the population towards 
shorter tracts (panels A and B), while an increase in fi forces the 
distribution to approach normal distribution and reduces the total 
number of bacteria in the population at equilibrium (panels C and 
D). An oscillating effect between ON and OFF selection is shown 
in Video SI (and the corresponding time average in Figure S5). In 
light of the PV assay and colony size measurements demonstrating 
reduced fitness/higher mortality rates for longer in-frame tracts 
(G 9 and G 12 ) compared to shorter tract (Gg) (see below), we 
introduced a tract length dependent death rate (Figure S4 case 4). 
Under this scenario, the steady-state allelic distribution is skewed 
towards shorter tracts (panels A and B). An increase in a. and 
particularly ft are required to skew the distribution towards longer 
tract lengths (panel C and D), again with a dramatic reduction of 
the total number of bacteria in the population. Different scenarios 
for selection (ON or OFF) were combined with the tract length 
dependent death rate to show their influence on tract length 
evolution (Figure S4 case 5 and Figure S4 case 6). Similar to the 
tract length dependent death rate (Figure S4 case 4), the scenarios 
for selection (ON or OFF) combined with a tract length dependent 
death rate show that the steady-state allelic distribution is skewed 
towards shorter tracts when a and ft are low (panels A and B in 
Figure S4 case 5 and Figure S4 case 6). For the scenario with tract 
length dependent death rate and ON selection (Figure S4 case 5), 
the increase in ft reduces the total number of bacteria at 
equilibrium (panels B and C). A further increase in both a. and 
ft force the population towards normal distribution, causing 
further reduction in the total number of bacteria at equilibrium 
(panel D). In the final scenario, with tract length dependent death 
rate and OFF selection (Figure S4 case 6), an increase in ft when a 
is relatively low allows the allelic distribution to drift towards 
longer tracts (panel C). Further increases in a and ft force the 
population towards normal distribution, again effectively reducing 
the total number of bacteria at equilibrium (panel D). 



Table 2. Sequence of the polyG tract constructs. 





Constructs 


PolyG tract region 


Required OFF ON mutation 


# of AA as ON* 


G 5 


GTG GAT GGG GG- CTG AAG CCA 


+1 nt (to G 5 ) 


2xGly 


G 7 


GTG GAT GGG GGG G- CTG AAG CCA 


-1 nt (to G 6 ) 


2xGly 


G 8 


GTG GAT GGG GGG GG- CTG AAG CCA 


+1 nt (to G 9 ) 


3xGly 


G 10 


GTG GAT GGG GGG GGG G- CTG AAG CCA 


-1 nt (to G 9 ) 


3xGly 


G„ 


GTG GAT GGG GGG GGG GG- CTG AAG CCA 


+1 nt (to G 12 ) 


4xGly 


G 13 


GTG GAT GGG GGG GGG GGG G- CTG AAG CCA 


-1 nt (to G 12 ) 


4xGly 


o 9 


GTG GAT GGC GGC GGC CTG AAG CCA 


In-frame 


3xGly 


o 12 


GTG GAT GGC GGT GGA GGG CTG AAG CCA 


In-frame 


4xGly 



*This lists the number of amino acids (AA) only in the tract as seen bold in the table, Gly = Glycine. 
doi:1 0.1 371/joumal.pone.OI 01 637.t002 
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Table 3. OFF^ON Switching frequencies. 





Initial tract 
length* 


Median (range) (wt) 


Step-wise 
comparison (wt)** 


Median (range) 
(MMR) 




Step-wise 

comparison 

(MMR)** 


Comparison wt 
& MMR deficient** 


Comparison repeat & 
non-repeat tract** 


G 5 (G 6 ) 


7.7x1CT 12 

(5.1 Xl0" 12 -2.7x10 _n ) 




1.7 x10"° 9 
(6.8x10~ 12 -2.1 xlO 


-05, 




-52.9, ns 


N/A 






-43.3, ns 






-67.2, p<0.05 






G 7 (G 6 ) 


6.5X10- 10 

(1.4x10 _11 -1.5x10 -07 ) 




1.9 x10~ 07 
(1.6x10~ ,0 -2.1 x10~ 


-05, 




-76.7, p£0.05 


N/A 






-50.5, ns 






-44.1, ns 






G„ (G 9 ) 


2.4 xlCT 08 

(1.3x10 _09 -2.1 x10~ 06 ) 




9.3x10~ 06 
(1.7x10~ 08 -1.0x10~ 


-03, 




-70.3, p£0.05 


-147.2, p=£0.001 (09) 






-20.2, ns 






38.4, ns 






G 10 (G 9 ) 


1.2 xlCT 07 

(3.0x10 _10 -1.1 x10~ 05 ) 




5.2 x10~ 07 
(3.2x10~ 10 -1.4x10~ 


-05, 




-11.7, ns 


-96.7, p£0.05 (09) 






57.7, ns 






77 .1, ns 






G„ (G 12 ) 


1.2x10-°' 

(1.2x10~"-5.3x10~ 07 ) 




1.2x10-° 9 
(4.6x10 _1, ~1.1 x10~ 


-07, 




7.7, ns 


-111.9, p<0.05 (012) 






-0.5, ns 






-26.0, ns 






G 13 (G 12 ) 


4.5x1(T 09 

(1.4x10~ 11 -2.8x10~ 07 ) 




1.1 x10~ 08 
(1.0x10~ 10 -4.0x10 


-07, 




-17.9, ns 


-111.5, p<0.05 (012) 




0 9 (in-frame)' 


0.6 (0.5-1.1) 




N/A 






N/A 








22.3, ns 






N/A 






0 12 (in-frame)' 


2.2 x10~ 04 

(2.4x10~ 07 -5.6x10~ 04 ) 




N/A 






N/A 





*lnitial tract length and closest ON configuration in (), see Table 2. 

**Kruskal-Wallis with post Dunn's multiple comparison tests, difference in rank sum. 

'Median survival frequencies are shown for the two non-repeat controls (0 9 and 0 12 ). 

No significant difference denoted by "ns". 

Non appropriate comparison denoted by 

Not available denoted by "N/A". 

doi:1 0.1 371 /joumal.pone.01 01 637.W03 



The distribution of PV homopolymeric tracts in three Mc 
strains 

The distribution of naturally occurring PV loci in Mc was 
reviewed here in order to understand the biologically relevant 
tracts and tract length range as studied in a live model. In Mc, out 
of 79 putative PV genes investigated, 50 were represented with 
homopolymeric tracts, 21 of which were polyC, while 18 were 
polyG, 8 were polyA and only 1 polyT, relative to the 
transcriptional direction of the individual gene (Table SI). Two 
additional genes, mtJB and dnaX, had different homopolymeric 
motifs in different strains and are not included here. Out of the 79 
putative PV genes, 68 of the genes were represented with 
intragenic repeat tracts (Table SI). The distribution of the 
homopolymeric tract lengths of putative PV loci in three different 
Mc genomes were found to be skewed towards shorter tracts, with 
the majority of loci constituted by 7 nt repeat tracts (Figure 3). 
Notably, many PV genes remain unverified and may be the 
consequence of other drivers than fluctuating selection for 
expression such as GC content harmonization. Also, the individual 
tract lengths listed in Table SI are snapshots in time of an 
individual genome and are expected to vary when subject to 
selection in live populations. Average tract lengths in all three 
strains were found to be close to 9 nt (8.84, 9.45 and 9.55 for 
strains MC58, Z2491 and FAM18, respectively) (Figure 3 and 
Table SI). The average length of homopolymeric tracts consisting 



of C/G was not found to be statistically different from tracts 
consisting of A/T {p = 0.096). 

Switching frequencies are influenced by tract length and 
MMR deficiency 

In a live Mc model, a PV assay was employed to investigate 
switching frequencies (OFF-^-ON) of Spc resistance in a panel of 
polyG tracts accommodated inside the model gene encoding 
AadA. From the start codon, the polyG tracts were constructed at 
1 18 nt and onwards inside the 795 nt aadA gene corresponding to 
amino acids 40 and onwards as initially described by Alexander et 
al. [17]. Rescued growth on selective media containing Spc 
enabled a measure for the tendency of each polyG tract to switch 
OFF— >ON expression of Spc resistance. The range of polyG tracts 
accommodated in aadA, 5 to 13 nt in length, as listed in Table 2, 
was monitored in the PV assay. Additionally, two non-repeat 
control strains (Og and Oi 2 ), both in the ON-configuration, were 
included (Table 2). 

In the wt background, the median switching frequency in the 
strain harbouring the shortest polyG tract (G 5 ) was found to just 
exceed that of the spontaneous mutation frequency to Spc 
resistance (7.7xl0 _lz vs 1.0xl0 _li ), which is known to appear 
in Mc due to mutations in 16S rRNA [57] (Figure 4 and Table 3). 
The G 7 strain switched with two orders of magnitude higher 
median frequency to ON (6.5xlO~ 1H ) than G 5 (7.7xl0~ 12 ). 
Sequencing of Spc resistant colonies in the PV assay confirmed 



PLOS ONE | www.plosone.org 



7 



July 2014 | Volume 9 | Issue 7 | e101637 



Meningococcal PV Models 




Number of bases 



Figure 1. Mathematical model illustrating random walk. Allelic 
steady-state distribution is shown of a modelled bacterial population 
without selection and no contribution of the cis acting factor a. The 
figure illustrate the isolated effect of the trans acting factor ft on the 
distribution of the tract lengths, an effect caused by increased random 
walk within a defined range of tract lengths. Where K=100, /( = 5, and 
dboth = 5. 

doi:1 0.1 371 /journal.pone.01 01 637.g001 

that both G 5 and G 7 switched to G 6 , the nearest ON configuration 
that required a single nucleotide insertion/deletion, respectively. 
The median switching frequency observed with the G 8 -strain 
(2.4xl0 -08 ) was two orders of magnitude higher than the 
frequency of the G 7 strain and hence four orders higher than 
the G 5 . The switching frequency of Gio (1.2x10 ° 7 ) was found to 
be only one order of magnitude higher than G 8 , significantly 
different from G 7 (p<0.05) and the highest recorded median 
frequency in the wt background. Sequencing of the polyG tract in 
Spc-resistant colonies confirmed that both G 8 and Giq switched to 
G 9 , the nearest ON configuration that required a single nucleotide 
insertion/deletion, respectively. Both Gu (1.2 xlO -09 ) and G13 
(4.5 xlO -09 ) switched to the nearest ON configuration G12 with 
median frequencies two orders of magnitude lower than the 
frequency of G 10 . The median switching frequencies in the wt 
background were found to be higher, but not statistically 
significant, for tract lengths requiring a deletion (G 7 , G 10 and 
G13) compared to tract lengths requiring an insertion (G 5 , G 8 and 
Gu) (Table 3). Two ON strains with non-repeat tracts were 
included in the PV assay as controls. The Og strain displayed a 
median switching frequency (ON^-OFF) close to 1 (0.6) (i.e. 100% 
survival) and in comparison the O12 strain displayed a median 
survival frequency (2.2 xlO -04 ) four orders of magnitude lower 
than Og. All median survival frequencies in the control strains 
differed significantly from the switching frequencies in the relevant 
polyG strains as outlined in Table 3. 

In order to investigate the association between PV and MMR, 
MutL null mutants were made for each individual Mc strain 
harbouring different polyG tract lengths in aadA, and these were 
monitored in the same manner as the strains in the wt background. 
In the MMR background, a similar stepwise increase in switching 



frequencies was observed in the G 5 to G 8 range as for that 
observed in the wt background (Figure 4 and Table 3). The G 7 
MMR strain switched with two orders of magnitude higher 
median switching frequency (1.9x10 ° 7 ) than G5 ( 1 . 7 x 1 0 ° 9 ) and 
thereby differed significantly from G 5 (/>£0.05). Furthermore, the 
median switching frequency observed with the G 8 -strain 
(9.3x10 6 ) was the highest recorded in the assay and nearly 
two orders of magnitude higher than the frequency of the G 7 strain 
(1.9xl0~ ) and hence nearly four orders higher than that of the 
G5 (/><0.001). The median switching frequency of the Gjo 
(5.2x10 ) was somewhat lower compared to G 8 (9.3x10 0 ), 
and the difference is not significant. The median switching 
frequencies of the G„ (1.2 xlO"" 9 ) and G13 (1.1 xlO" 08 ) strains 
were further reduced by more than two orders of magnitude 
compared to the G 8 strain and the differences were significant {p< 
0.001 and /i<0.05, respectively), displaying a similar pattern to 
that of the wt. Taken together, MutL-derived MMR deficiency 
was shown to increase the median switching frequency relative to 
wt by nearly three orders of magnitude in individual comparisons 
in the G^-Gg range. No difference was observed in comparing 
switching frequencies in MMR and wt strains with polyG tracts in 
the Gk)-Gi 3 range. 

Colony size correlates inversely with polyG tract length 
after growth on selective media 

The results from the PV assay described above did not display 
an association between the longest tract lengths (>10 nt) and the 
highest frequencies of switching that was expected. This observa- 
tion warranted further investigation. It was noticed during the PV 
assay that strains with different polyG tract lengths not only 
switched to resistance with different frequencies, but also that these 
strains displayed varying colony sizes only when grown on selective 
media (Figure S6). Strains with the shorter tract lengths seemed to 
yield larger colonies than strains with longer tracts. Therefore, the 
colony size (area) of re-plated ON strains in both backgrounds 
were measured after 48 h growth on selective media. The data was 
grouped by ON tract lengths (i.e. G s / G 7 = G 6 , G 8 / G 10 = G 9 and 
G11/G13 = Gi 2 ) (Figure 5 and Table 4). No statistical differences in 
colony size were found within each group or between backgrounds 
(wt vs. MMR deficient), nor between strains grown in the absence 
of Spc selection. However, highly significant differences were 
found (/<<0.001) between groups G 6 vs. G 9 and between G 9 vs. 
G] 2 (and G 6 vs. Gi 2 ) where G^ were the largest and Gi 2 the 
smallest colonies. G 9 colonies were on average smaller than half 
the average G 6 colony and the average G \ 2 colony was yet smaller 
than half the average G 9 colony. 

Discussion 

Mathematical modelling of the dynamic allelic distribution of 
homopolymeric tracts in a population predicted that tract length 
dependent, cis (a) and tract length independent trans (/?) acting 
factors of stability influence tract length evolution differently, ft 
designates the baseline instability that corresponds to the 
probability for making a transition to a different length. 
Transitions happen randomly (unless a bias is applied) and the ft 
function causes random walk. An increase in ji allows the allelic 
distribution to drift towards longer tracts where a is increased. In 
the absence of /? the allelic distribution may not drift away from 
the shorter tracts where the effect of a is lower. The random walk 
effect of \i is illustrated in Figure 1, where there is no penalizing 
selection and a = 0, illustrating how the allelic population is forced 
towards a normal distribution by the set tract length range. In vivo 
this range may be defined by yet other factors such as biases in 
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Figure 2. Mathematical modelling of the evolution of homopolymeric tract length. Allelic steady-state distributions are shown following 
different values of tract length dependent instability (a) and tract length independent instability (/}) (panel A-D).The influence of selection was taken 
into account by implementing penalizing selection on tracts that were not multiples of 3, corresponding to the out-of-frame tract length selection 
seen in the PV assay. Where /C= 100, /i = 5, d s = 1 and d,=5. 
doi:1 0.1 371 /journal.pone.01 01 637.g002 



proofreading (insertion/deletion) and protein constraints. A high 
tract length dependent a favoured shorter tracts over the course of 
evolution (Figure 2, panels A and B). In the extreme example with 
high a and P (Figure 2 panel D), not even the penalizing large 
death rate (dfj that applies to tract lengths that are other than 
multiples of 3, is sufficiently powerful to prevent near complete 
fluidity between allelic forms. The increased /? forces a normal 
distribution of the allelic forms by increasing the probability for 
making a transition to a different length in the manner of random 
walk. The penalizing selection (dj) also becomes more efficient 
when tracts are rapidly changing away from tract lengths with 
lower death rate (flQ. These increasingly forced transitions to a 
different tract length have in turn a dramatic effect on fitness of the 
population as shown by the relative few surviving bacteria of each 
allelic form at equilibrium (Figure 2, panel D). 

Alterations of the tract length occur due to polymerase slippage 
during replication [58-60]. Given that an ideal mutation rate 
exists at a particular locus at a moment in time, an error-prone 
polymerase may be able to provide that instability using shorter 
tracts than a high fidelity polymerase. However, overexpression of 
the single error-prone polymerase (DinB) in Mc does not cause a 
global mutator phenotype such as in E. coli [46] . Also, tract lengths 
may not exclusively evolve from their ability to fine-tune an ideal 
mutation rate but may be determined by molecular drivers [30]. 



Hypothetical MMR deficient populations were modelled with 
higher ft values, and a values as functions of a. in the wt population 
(Figure SI). This approximation may not fully represent the in vivo 
state since MMR deficiency also could be represented with ft as a 
factor of a. In the PV assay however, MMR deficiency was found 
to cause 200-400-fold increase in instability of the G 5 , G 7 and G H 
tracts. Therefore, within the defined range of tracts, a linear model 
may prove to be a useful approximation. The model allowed us to 
visualize a range of different contributions of MMR deficiency in a 
tract length dependent manner. The different scenarios modelled 
included: i) parallel increase in instability in wt and MMR 
populations where the contribution of MMR relative to the total 
instability decreased, a) increased relative contribution of MMR 
and Hi) a constant contribution of MMR deficiency to the total 
instability for all tract lengths. The relative reduction in the 
contribution of the modelled MMR deficiency may be similar to 
the decreasing effect of MMR deficiency with increasing tract 
length that was observed in the PV assay (see below for G 8 /G 10 
comparison). DNA repair deficient strains known as mutator 
strains generate a broad spectrum of variants at great cost as most 
mutations are deleterious (in our model those generating G 5 , G 7 , 
G 8 , Gin, G n , G13 when ON-selection applies) and fewer are 
advantageous (G 6 , Gg and G 12 ). Despite the costs, mutator strains 
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Strains 



□ MC58 
□ Z2491 
■ FAM18 




Homopolymeric tract length 



Figure 3. Distribution of homopolymeric tract lengths in Mc strains. A comparison of the homopolymeric tract lengths of putative PV genes 
identified in three different Mc strains (see references in Table S1). 
doi:1 0.1 371 /journal.pone.01 01 637.g003 



a Repeat tract (wt) 

Repeat tract (MMR def.) 
• Non- repeat tract 




Tract length (nt) 



Figure 4. Switching frequencies of a range of polyG tracts in vivo in Mc wt and MMR deficient backgrounds. Switching frequencies (left 
y-axis) are shown for each tract length G 5 to G 13 in wt (black pyramids), the MMR deficient background (grey squares), and the survival frequencies 
(right y-axis) for the non-repeat controls (black circles). Median value indicated for each tract length for wt (red) and MMR deficient (blue) 
background. Dotted line indicates the spontaneous mutation frequency for Spc resistance (1.0x10~ 12 ). 
doi:1 0.1 371 /journal.pone.01 01 637.g004 
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modeled different scenarios of death rate (d); no selection, ON/ 
OFF selection, linear increasing death rate, and combination of 
linear increasing death rate and ON/ OFF selection (Figure S4). A 
scenario with fluctuating ON/OFF selection (Figure S4, cases 1 
and 2; Video SI) may effectively describe the natural evolution of 
PV loci in Mc, and the model PV assay employed in this study (see 
below). The oscillating ON/OFF selection (Video SI) shows the 
dynamic shifts in the allelic distribution of tract lengths that on 
average approach a normal distribution with median tract length 
at 9 (Figure S5). When comparing the allelic distribution of the 
time average to the corresponding allelic distributions of a single 
ON or OFF selection scenario with the same a. and ft values (panel 
B in Figure S4 case 2 and case 3), it becomes evident that modeled 
populations under oscillating selection can maintain longer tract 
lengths. This may explain how longer repeat tracts are maintained 
in bacteria in nature i.e dependent on oscillating selection. 

The review of the existing literature addressing neisserial PV 
events revealed a total of 87 putative PV genes in the three selected 
Mc genomes, all three with average homopolymeric tract length of 
~9 nt (Figure 3). It has been shown by Markov chain analysis of 
the MC58 genome that C/G-tracts longer than 6 nt, and 
particularly 8 nt, were found in considerably higher excess than 
expected in this organism [39]. Contrary to a previous report [39], 
this study did not reveal any statistical difference between 
homopolymeric C/G and A/T tract lengths. The distribution of 
the homopolymeric tracts identified in this study exhibits a bias 
towards C/G repeats (39:9) and intragenic location (41:9) as 
shown in Table SI, whereas genomic analysis of hypothetical 
homopolymeric tracts in the coding region has revealed an 
underrepresentation of C/G repeats with less observed tracts than 
expected from random distribution [2] . The overrepresentation of 
homopolymeric tract lengths around 9 nt may indicate a suitable 
range for balancing variation and conservation, where short tracts 
will be too stable and long tracts will be too unstable [2,65,66]. A 
previous study has shown how evolutionary pressures may mould 
such PV loci, showing that the position of the repeat in pathogenic 
bacteria is suppressed in the middle and enriched near the termini 
of the genes [2] . Notably, many of the PV loci listed in Table S 1 
are not verified as PV genes and the presence of SSRs may be the 
consequence of e.g. GC content harmonization. Also, the 
individual tract lengths listed in Table SI are snapshots in time 
of an individual genome and are expected to vary when subject to 
selection in live populations. Further characterization of individual 



Table 4. Colony size (mm 2 ). 





In-frame tract length* 


Mean (95% CI) 


Step-wise comparison** 


Comparison repeat & non-repeat tract** 


Ge 


10.9 (9.9-12.0) 




N/A 


f = 15.2, df=91, p<0.0001 


G 9 


4.4 (4.0-4.7) 




f = 2.3, dt =95, p==0.05 (09) 


f = 23.4, df=151, p<0.0001 


G 12 


0.6 (0.5-0.6) 




f = 6.2, <#=138, p==0.0001 (012) 


o 9 


3.7 (3.4-4.1) 






f=26.4, df = 82, prSO.0001 


o 12 


0.3 (0.2-0.3) 







*ln-frame ON tract length, see Table 2. 
**Unpaired Student's t-test, two-tailed p-value. 
Non appropriate comparison denoted by 
Not available denoted by "N/A". 
doi:1 0.1 371 /journal.pone.01 01 637.t004 



20 







E 15- 


♦ 


E 

O 




>. 
c 

1 5- 

o 





Repeat tract 
Non-repeat tract 



Tract length (nt) 

Figure 5. An inverse relationship between Mc colony size and 
homopolymeric tract length. Colony sizes (area) of CFUs following 
the PV assay are shown for the in-frame tract length of the PV 
constructs (black diamonds), and the non-repeat controls (black circles). 
Mean value for each in-frame tract length indicated (red). 
doi:1 0.1 371 /journal.pone.01 01 637.g005 

have been shown to be successful in several models and organisms 
[61,62]. 

Previous studies of the instability of long (>30 nt) repeat tract 
lengths in bacteria have revealed a deletion bias [22,29,63,64], 
while for eukaryotic cells a possible insertion bias has been 
proposed [24,29]. In C. jejuni, Bayliss and co-workers showed that 
biases in the mutational pattern changed from predominantly 
insertions for G 8 and Gg to predominantly deletions for G 10 and 
Gii [30]. This shift may potentially be related to how indels 
predominantly are repaired by polymerase proofreading in short 
tracts to MMR in longer tracts as discussed in [30]. A possible 
insertion bias for G 8 and G 10 was observed in the PV assay (see 
below). Adjusting the mathematical model with insertion/ deletion 
bias revealed how the steady-state population tends towards longer 
and shorter tracts for an insertion bias and a deletion bias, 
respectively (Figures S2 and S3, see panels C and D). 

The mathematical model makes the assumption that the 
reproduction rate is independent of the tract length, whereas the 
death rate is dependent of the tract length, shown in Figure 2, with 
small death rates (</,) for tract lengths in multiples of 3 (Gg, Gg and 
G12), and large death rates (dj) for all other tract lengths (G 5 , G 7 , 
G R etc.). Findings from the model PV assay used here and colony 
size measurements indicate reduced fitness/survival for long in- 
frame tracts compared to short tracts (see below), as such we 
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putative PV loci and genome comparisons are thus warranted to 
better understand SSR evolution. 

This study investigated the extent to which the length of a polyG 
tract and MMR influences tract instability by monitoring shifting 
expression of an antibiotic resistance reporter, AadA, conferring 
Spc resistance. Mc PV frequencies have been found to differ 
depending on the PV monitoring system used [17]. Alexander and 
co-workers found that strains devoid of haemoglobin utilization 
(AhmbR/ AhpuB) continued growth for ~3 divisions when cultivat- 
ed on iron-limited media causing an overestimation of hmbR 
OFF-^ON switching compared to aadA Spc s ^Spc R switching 
[17]. A model PV assay of Spc s ^Spc R was therefore employed in 
the live model presented here. The range of switching frequencies 
observed in the model PV assay were found to be lower compared 
to previously published Mc PV frequencies (10 4 to 10 5 ) 
[27,45,46] and to PV frequencies in other bacteria (10 2 to 
10 4 ) [22,30]. This may be due to strain differences. Previous 
publications have noted many fold differences in switching 
frequency of constructed PV loci between different Mc strains 
(10~ 2 to 10" 6 ) [20,27]. The observed PV frequencies for Z1099, a 
Mc serogroup A strain, fall within the frequency range previously 
observed for a Mc serogroup A strain with overexpressed MutS 
(~10 ) [17]. Z1099 does therefore exhibit switching frequencies 
in the lower range of previously published frequencies, and the 
comparative status of DNA repair in this strain remains to be 
investigated at the molecular level. 

The results from the PV assay indicate an expected stepwise 
increase in switching frequencies with increased tract length (from 
G 5 to G 8 ), most probably due to the tract length dependent 
instability (a cis acting factor). The stepwise log-linear increase in 
PV for the repeat tract lengths G 5 to G 8 was comparable to 
previous studies using the same basic construct [45], and with 
homopolymeric repeat tracts in yeast [24] . The median frequency 
observed for G 10 deviates from the log-linearity with an increase 
only one order of magnitude higher than that of G 8 in the wt 
background. This increase is less than expected since G 10 is two 
nucleotides longer than G 8 and hence expected to be considerably 
more unstable due to tract length dependent instability. Compar- 
atively there is an increase in median frequency of two orders of 
magnitude when the tract is increased with a single nucleotide 
from G 7 to G 8 . These observations may indicate an insertion bias 
in tracts G 7 , Gg and Gio. An insertion bias would increase the 
difference between G 7 and G 8 and decrease the difference 
between G 8 and Gi 0 since both G 7 and G 10 require a deletion 
to reach the closest ON-state whereas G 8 requires an insertion. 
The possible insertion bias finds further support when studying the 
frequencies in the MMR deficient background as discussed below. 
The deviation from log-linearity in switching frequencies observed 
for repeat tract lengths longer than G 10 are likely due to altered 
protein function. This phenomenon may be due to a reduction in 
fitness from decreased antibiotic resistance effectiveness of the 
AadA protein. Such an effect may in turn be ascribed to the 
differences in the number of transcribed amino acids from the 
polyG tract inside aadA. When Alexander and co-workers initially 
inserted the polyG tract in aadA, they replaced a non-homopol- 
ymeric pair of Gly codons (6 nt) with a homopolymeric and 
incomplete triplet of Gly codons (8 nt) [17]. aadA is not a PV gene 
in nature but a model locus applied here. The different ON repeat 
tract lengths in the coding region of aadA encode two {G^j, three 
(Gg and Og) or four (G 12 and 0 12 ) Gly- amino acids. The efficiency 
of the protein may be influenced by the number of amino acids 
encoded by the polyG region, with a drastically reduced efficiency 
when the tract encodes four amino acids, as such reducing the 
viability of the bacterial cells despite being in an ON-configura- 



tion. Structural predictions of homologous AadA in Salmonella 
enterica indicate that the PV tract resides in a fi-loop in between 
and in close proximity to conserved regions [67]. Increasing the 
tract length and thereby the number of amino acids may therefore 
cause subtle alterations in the three-dimensional protein config- 
uration that reduce, but notably do not abolish, adenylyl 
transferease activity. The survival frequencies monitored in the 
non-repeat controls supported this hypothesis, since the survival 
frequency of the 4 x Gly (0 , 2 ) differed from a frequency of 1 and 
that of the 3xGly (0 9 ). Further support for this hypothesis was 
obtained by monitoring the colony size as discussed below. 

The PV assay confirmed a well-established association between 
MMR and SSR instability [21,27,62]. The MutL-derived MMR 
deficiency was shown to increase switching frequencies 200-400- 
fold relative to wt within the G 3 -G 8 range. With a hmbR PV assay, 
Alexander and co-workers have previously shown a 51-59 fold 
increase in switching frequencies of a G 8 tract in a MMR deficient 
serogroup A Mc strain (IR4048) [20]. MMR contribution is 
considerably more pronounced (200-400-fold) in our observations 
using a different serogroup A strain (Z1099) and a different PV 
assay. Some of this difference may possibly be ascribed to the PV 
models in that the hmbR PV assay is less stringent than the Spc 
assay. However, differences in MMR efficacy in separate strains of 
variable genetic composition and context cannot be excluded. It is 
therefore interesting that a high incidence of MMR mutator 
strains in epidemiological isolates has been reported [45]. The 
strain used here (Z1099) may not accurately represent the whole 
spectrum of pathogenic Mc strains, but allude to the great 
variation in MMR influence on SSR instability. 

In yeast, functional MMR has been shown to reduce insertion 
events in repeat tract lengths causing a deletion bias [59], as well as 
strongly affecting the fidelity of long repeat tract lengths [24]. 
Comparison of the PV rates in this study (Table 3) does not 
indicate that a deletion bias is alleviated in the MMR deficient 
background. On the contrary, a potential insertion bias, like in the 
wt background, is indicated in comparing G 8 and Gio in the 
MMR deficient background (Figure 4). The median switching 
frequency of the shorter G 8 requiring an insertion was near 20-fold 
higher than the longer Gio requiring a deletion. As for the wt 
background, we would expect the Gio frequency to greatly exceed 
that of G 8 due to an increase in tract length dependent instability. 
An insertion bias for G 8 (and Gg) has also been documented in C. 
jejuni [30]. Furthermore, the difference in G 10 median switching 
frequency of the wt and MMR deficient strains are considerably 
smaller (and not statistically significant) compared to the respective 
statistically significant differences for G 7 and G 8 . Also it is notable 
that the frequencies of G 7 and G 10 (both requiring a deletion) in 
the MMR deficient background are very similar. These observa- 
tions seem to suggest that, irrespective of a potential insertion bias, 
the influence of MMR deficiency on tract instability becomes less 
pronounced as tracts get longer as modelled in Figure SI. This 
notion is further supported by the lack of difference between 
backgrounds in G u and G 13 strains, although their respective 
frequencies are confounded by impaired growth. 

When comparing the colony sizes of strains with different ON- 
tract lengths when grown on selective media (Spc), a highly 
significant negative correlation was documented between ON- 
tract length and colony size (Table 4 and Figure 5). A similar 
highly significant tendency was detected also in the non-repeat 
tract control strains (Og and O i2 ). The tendency for reduced 
colony size with increased tract length, could potentially be 
alleviated by reducing the selective pressure mediated by Spc, 
since colony sizes were equal in the absence of selection. Apart 
from the tract length, the strains were identical and the colony 
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phenotype therefore seems exclusively associated with the drug 
resistance mechanism, inactivation of Spc by adenylylation, which 
is a characteristic of AadA. The original non-manipulated AadA 
has 2xGly, and the manipulated 2xGly strains monitored here 
give rise to the largest colonies, indicating the highest relative 
fitness of the panel. The fitness reduction in the 3 xGly and 4 xGly 
strains may be the most important factor to influence the PV assay 
in the longer tract range, overwhelming the monitored other cis 
(length) and trans acting (MMR) factors. 

Conclusions 

A linear mathematical model of SSR dynamics allowed the 
study of how individual factors influenced evolution, alone and in 
concert with others. Values of the tract length dependent factor a 
were found to inversely correlate to tract length evolution 
illustrating the interplay between fidelity of DNA replication and 
SSR evolution. The baseline instability factor /i, analogous to 
DNA repair deficiency, was shown a very powerful determinant of 
tract length evolution that, if exaggerated, had the potential to 
greatly reduce fitness of the population by forcing the generation 
of deleterious mutations. Theoretical scenarios were oudined 
describing how the relative contribution of MMR to the instability 
of individual tract lengths could differ. Some of these show how 
the relative contribution of MMR to tract instability decreases as 
the tract length dependent instability grows in accordance with 
observations of the in vivo model. MMR was shown to increase 
homopolymeric G tract instability 200-400-fold in strain Z1099, 
considerably higher than reported in other Mc strains. The 
mathematical model also illustrated how fluctuating selection for 
expression was required for the evolution of longer tracts and 
captured thereby the essence and dynamic rationale behind PV 
evolution. Molecular drivers of tract length evolution was 
modelled as insertion and deletion biases and found to drive the 
allelic distribution towards longer or shorter tracts, respectively. 
The experimental data suggested that tracts ^Gio experienced an 
insertion bias; the first documentation of such phenomenon in Mc. 
Finally, this study highlights the notable influence of protein 
function on the potential for evolvability by means of PV and the 
de novo evolution of intragenic SSRs. PV loci may seem subject to a 
balancing act between tract elongation and tract contraction 
defining a tract length space within the borders of complete fluidity 
and inflexibility. The improved understanding of the dynamics of 
SSR and PV may shed light on how individual PV loci and 
thereby antigenic epitopes evolve, which is relevant for population 
dynamics, host-pathogen interactions and ultimately for vaccine 
development. 

Supporting Information 

Figure SI Mathematical modelling of the relative 
contribution of a hypothetical MMR deficiency to tract 
instability. The increase in instability (y and S) with increasing 
tract length is shown for a hypothetical wt population (blue), a 
MMR deficient population with a values equal to that of the wt 
population (green), a MMR deficient population with relatively 
higher a values to the wt deficient population (red) and a MMR 
deficient population where the a values are equal to the relative 
difference between /? in the MMR deficient and the wt populations 
(turquoise). 
(TIF) 

Figure S2 Mathematical modelling of the evolution of 
homopolymeric tract length with an insertion bias. 

Allelic steady-state distributions as in Figure 2, with an insertion 



bias (5 — 0.8y). The different panels A-D illustrate the allelic 
steady-state distribution of tract lengths following different values 
of tract length dependent instability (a) and tract length 
independent instability (fi). Where K= 100, fi = 5, d s = 1 and d t = 5. 
(TIF) 

Figure S3 Mathematical modelling of the evolution of 
homopolymeric tract length with a deletion bias. Allelic 
steady-state distributions as in Figure 2, with a deletion bias 
(y = 0.8(5). The different panels A-D illustrate the allelic steady- 
state distribution of tract lengths following different values of tract 
length dependent instability (a) and tract length independent 
instability (/?). Where K= 100, fi = 5, d s = 1 and d t = 5. 
(TIF) 

Figure S4 Mathematical modelling of different scenar- 
ios of selection. Allelic steady-state distributions shown for 
different values of death rate/mortality d s and di as depicted in the 
first slide: without selection (case 1), selection ON (case 2) and OFF 
(case 3), tract length dependent death rate (case 4), and 
combinations of ON/ OFF selection and tract length dependent 
death rate (case 5 and 6). Panels A-D in each case 1-6 illustrate 
the allelic steady-state distributions of tract lengths following 
comparable values of tract length dependent instability (a) and 
tract length independent instability (/?). Where K= 100 and fi = 5. 
(ZIP) 

Figure S5 Time average of the mathematical modelled 
population with oscillating ON/OFF selection shown in 
Video SI. Allelic time average distribution of the mathematical 
modelled population with oscillating ON/OFF selection (as shown 
in Video SI). In the modelled population K= 100, fi = 5,d s =l and 
d[ = 5, where d s and d t oscillates for tract lengths that are multiple of 
three (6, 9, 12) and not multiple of three (5, 7, 8, 10, 11 & 13). 
(TIF) 

Figure S6 Colonies after 48 hours of growth on Blood 
Spc50 plates for the panel of in-frame tract lengths in wt 
and non-repeat controls. Depicting a selection of colonies 
representing the in-frame tracts G 6 (for G5/G7), G9 (for Gh/G 10 ) 
and G 12 (for G n /Gi3) an d the non-repeat repeat tracts O9 (9 nt) 
and 0 12 (12 nt), the white bars represent 10 mm. Tract length 
constructs as described in Table 1 and 2. 
(TIF) 

Table SI List of putative phase variable loci in three Mc 
strains. 

(DOC) 

Video SI Mathematical modelled oscillating ON/OFF 
selection. The video illustrates the temporal shifts in allelic 
distribution when a mathematical modelled population is 
subjected to oscillating ON and OFF selection (non-oscillating 
ON and OFF selection shown in Figure S4, case 2 and 3). The 
population does not reach steady-state equilibrium as long as it is 
under oscillating penalizing selection pressure, a time average of 
the oscillating allelic distributions is shown in Figure S5. In the 
modelled population, K= 1 00, fi = 5,d s =\ and di = 5, where d s and 
di oscillates for tract lengths that are multiple of three (6, 9, 12) and 
not multiple of three (5, 7, 8, 10, 11, 13). 
(MP4) 
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